5 research outputs found
CiteTracked: A Longitudinal Dataset of Peer Reviews and Citations
Scientific dissemination is of central importance for the scientific process. This paper presents CiteTracked, a dataset of peer reviews and citation statistics covering scientific papers from the machine learning community and spanning six years. We describe and analyze the data collection of over 3,000 published papers, their peer review texts and citation counts, and depict possible usage directions. The dataset aims at fertilizing novel interdisciplinary work between fields such as scientometrics, information retrieval, computational linguistics and natural language processing to study the scientific publishing process